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of Failure 





“Dut all your eggs in These tutorials are a simplified 
introduction, and are not sufficient on 

one basket, and then their own to achieve system safety. 

watch that basket!” You are responsible for the safety of 
your system. 





— Mark Twain © 2020 Philip Koopman 1] 
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= Anti-Patterns for Critical Software: 
e Hardware single points of failure 


e Correlated, accumulated 
multi-point failures 


e Making assumptions about failures 
e Non-diverse, low-SIL software 


= Fault Containment Region (FCR) 
e Faults from outside FCR are kept out 
— Faults inside FCR are kept in 
e But, within FCR a single fault has arbitrarily bad effects 
— It's like a shotgun blast inside the FCR 
—- Applies to both SW faults and HW faults (e.g., single event upsets) 
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Toyota Unintended Acceleration (UA) aes. 
= Perhaps 89 deaths, hundreds of serious injury lawsuits 
@ S1 .6B class action settlement Toyota Electronic Throttle Control 


e Jury found system defective 


—- Toyota “acted in 
reckless disregard 
een te 


e Many of issues were SW, . ban 
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Digital Input & A/D Conversion 


$ accelerator 
but also a HW problem: \ a A vpa2 ; Electronic 
ee til a niceien 
= Two accelerator inputs =e oi = in 


Sensors ; 


e But — shared A/D converter 
VTA: Throttle Position [NASA] 


© Could result in electronically VPA: Accelerator Pedal Position 
“stuck” accelerator pedal Single Point of Failure 


Vehicle Speed 
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m Multiple FCRs required for life-critical 
and highly mission-critical systems 
e This isolates faults in redundant 


components — no single point of failure 
e Avoid an Achilles’ Heel in your system 
— All software on CPU can be a ‘single point” 


= Multi-channel (e.g., 2 of 2) 

e Compare identical component outputs 
= Doer/Checker (monitor/actuator pair) 
e “Checker” makes sure “Doer’ is safe 

m Safety gate 
e Only permits safe outputs to issue 
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REDUNDANT INPUTS 


Y) : 
= Monitor 
ol 
= 
= SAFETY ‘: OUTPUTS 
<— SHUTDOWN: 
< 
Za 
— 
mo 
LL 
Oc eS 
Safety Gate 


UT 
ENABLE QUTPUTS 
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= Correlated faults if multiple FCRs are likely to fail together 


= Accumulated faults 
e Fault not detected 
e Fault not repaired before next mission 
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Common design faults (including software) 
Common manufacturing faults 

Shared infrastructure (e.g., power, clock) 
Physical coupling 

— Shared wiring harness, connectors 

— Shared location (e.g., hot spot) 





' 
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USAF: https://goo.gl/df5pdg 
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= Safety is improved by using multiple FCRs 
e Hardware redundancy / HW isolation 
— Typically each FCR should be an independent chip 
e Software must be practically “perfect” 
e Common patterns: multi-channel, checker, safety gate 


= Pitfalls are numerous and sometimes subtle 

e Two copies of same SW fail the same way 

e Ensure multi-channel doesn't fail as “always trust one channel” 
e Ensure the checker doesn't fail as “always checks OK" 
© 


Look for hidden correlation (HW design defects, shared libraries, shared 
requirement defects, physical connection, shared clock, shared power, ...) 
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fy ERROR 


IF YOU'RE SEEING THIS, THE CODE |5 IN WHAT 
IL THOUGHT WAS AN UNREACHABLE STATE. 

I COULD GIVE YOU ADVICE FoR WHAT TO DO. 
BUT HONESTLY, WHY SHOULP YOU TRUST ME? 
L CLEARLY SCREWEP THIS VR IM WRITING A 
MESSAGE THAT SHOULD NEVER APPEAR, YET 
I KNOW IT WILL PROBABLY APPEAR SOMEDAY. 


ON A DEEP LEVEL, I KNOW IM NOT 


UP TO THIS TASK. IM SO SORRY. 





https://xked.com/2200/ 
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